
STOP 



Early Journal Content on JSTOR, Free to Anyone in the World 

This article is one of nearly 500,000 scholarly works digitized and made freely available to everyone in 
the world by JSTOR. 

Known as the Early Journal Content, this set of works include research articles, news, letters, and other 
writings published in more than 200 of the oldest leading academic journals. The works date from the 
mid-seventeenth to the early twentieth centuries. 

We encourage people to read and share the Early Journal Content openly and to tell others that this 
resource exists. People may post this content online or redistribute in any way for non-commercial 
purposes. 

Read more about Early Journal Content at http://about.jstor.org/participate-jstor/individuals/early- 
journal-content . 



JSTOR is a digital library of academic journals, books, and primary source objects. JSTOR helps people 
discover, use, and build upon a wide range of content through a powerful research and teaching 
platform, and preserves this content for future generations. JSTOR is part of ITHAKA, a not-for-profit 
organization that also includes Ithaka S+R and Portico. For more information about JSTOR, please 
contact support@jstor.org. 



81] Formula for Drawing Two Correlated Curves. 165 



FORMULA FOR DRAWING TWO CORRELATED 
CURVES SO AS TO MAKE THE RESEMBLANCE 
AS CLOSE AS POSSIBLE. 

By Francis Todd H'Doubler. 



One of the problems with which statisticians are confronted almost 
daily is that of drawing two curves in such a way that the correspond- 
ence, so far as the eye can judge, shall be as close as possible. The 
statistician, as a rule simply experiments with his scales, varying the 
two units until the curves fall into some sort of correspondence, aban- 
doning his graphical illustration if the appearance of the two curves is 
not such as to warrant a positive conclusion. As is shown below, it 
is possible to derive a formula which will, without imposing too much 
labor upon the statistician, make this resemblance as close as could be 
desired. If, for instance, we are comparing yearly marriage rates with 
yearly prices of wheat, the two groups will be brought into closest cor- 
respondence if, after plotting the marriage rate, we multiply each 

St* it- 

annual price of wheat by the expression ' 2 ' , where x lt x 2 . . . x s 

represent the annual marriage rates, and y lt y% . . . y s represent the 
annual prices of wheat. By closest correspondence, in the above state- 
ment, is meant such juxtaposition of the two curves as will make the 
sum of the squares of the distance between the two curves at the suc- 
cessive years a minimum. This definition of closest correspondence 
has much to be said in its favor on general mathematical grounds. If, 
however, we define closest correspondence as such juxtaposition as will 
make the sum of the distances between the two curves regardless of 
sign, a minimum, the problem becomes much more complex and the 
solution very laborious. In the great majority of instances, moreover, 
the adjustment of the two curves by the two methods would probably 
be practically identical. The problem has possibly been worked out 
before, but has never come to my attention; and the utility of the 
formula in practical statistical work warrants its reproduction, even if 
it has been previously published. 

and y lt y 2 , y 3 . . . y* 

be the statistics of two correlated phenomena, where x\ and y\ cor- 
respond to the same value of the variable. Thus x 1 and y-. might be 
respectively the marriage rates and the average wheat prices of a cer- 
tain year; x 2 and y 2 the rates of the next year, and so on. 



166 American Statistical Association. [82 

If these figures are plotted, two curves are obtained which show in a 
way the correlation existing between the two phenomena. We wish to 
make the two diagrams fit as closely as possible. 

Plot the x's just as they occur. Then multiply each y by the quantity 
e, and plot the resulting figures. We have the two sets 

ey u ey 2 , ey 3 . . . ey v 

Consider the sum of the squares of the differences between cor- 
responding x's and y's. This quantity is 

(x 1 —ey 1 ) 2 +(x 2 —ey. i ) 2 + . . . +(£*— «/ K ) 2 
or, in short 

2i(:ri — eyi ) 2 
It must be remembered that x\ and ja are fixed numbers. Evidently 
Zi(aa — eyi ) 2 =/(e) 

is a function of e — , that is, it varies with e. Let us seek that value of 
e which causes this function to take its minimum value. 

First of all we note that S(m — ey, ) 2 is the sum of squares and hence 
of positive quantities, Consequently it is always positive. Moreover, 
it increases beyond limit as e increases. Therefore, when the deriva- 
tive of this function is equated to zero we have the condition for a mini- 
mum (by differential calculus). 

Differentiating and equating to zero, we have 

—2{x 1 —ey 1 )y 1 —2{x 2 —ey 2 )y 2 — . . . — 2(a; N — ey s )=0 
and solving for e 

— Xiyi+x 2 y % -\-x 3 y 3 -\- . . . +x„y s 
Vi"+V**+ ■ ■ ■ +2/k 2 



or 



e=- 



\ (1) 

Zijft 2 



When the y's are multiplied by this value of e the sum of the squares 
of the differences (aa — ey\ ) is a minimum, and the x — curve and the 
modified y — curve are sure to fit closely. 

In case the y's are unaltered while the x's are all multiplied bye 1 , the 
value of e 1 giving a minimum to 2(j/i — e 1 Xi ) 2 is similarly found to be 

N 

2i(aH 3ft ) 

*=\ ( 2 ) 

2in 2 
i 

If xi —yi , then both e and e 1 are equal to unity. 



83] Formula for Drawing Two Correlated Curves. 167 

One might ask why it would not be better to consider merely the 
differences (ca — eyi ) regardless of sign, and to seek the value of e 
causing the sum of these differences to assume its minimum. In other 
words, consider the sum of the absolute values,* | xi — eyi | , — i.e. 

I x i—eVi I + I oc 2 — ey 2 \ + \ x B — ey s \+ . . . +\ x s —ey s \ 
or 

N 

Si | Xi — eyi | 

This function is always positive. Its graph is a broken line with n 
corners — a corner being located above each of the abscissae, e=xi y\ 
(i=l . . . N). One of these n ratios gives 

N 

S xi — eyi , its minimum. It must be found by actually 

1 

substituting each ratio and seeing which gives the above sum the 
smallest value. There is no general formula applicable here, since the 
derivatives at these corners are all discontinuous. This method 
involves so much tedious work that the first method, with its general 
formula, is much more practicable, as well as preferable. 

*In mathematical notation | x | is used to indicate the absolute or numerical 
value of as. Thus I —10 I = 1 10 | . 



